Deep Transform: Cocktail Party Source Separation via Probabilistic Re-Synthesis
نویسنده
چکیده
In cocktail party listening scenarios, the human brain is able to separate competing speech signals. However, the signal processing implemented by the brain to perform cocktail party listening is not well understood. Here, we trained two separate convolutive autoencoder deep neural networks (DNN) to separate monaural and binaural mixtures of two concurrent speech streams. We then used these DNNs as convolutive deep transform (CDT) devices to perform probabilistic re-synthesis. The CDTs operated directly in the time-domain. Our simulations demonstrate that very simple neural networks are capable of exploiting monaural and binaural information available in a cocktail party listening scenario.
منابع مشابه
Deep Transform: Cocktail Party Source Separation via Complex Convolution in a Deep Neural Network
Convolutional deep neural networks (DNN) are state of the art in many engineering problems but have not yet addressed the issue of how to deal with complex spectrograms. Here, we use circular statistics to provide a convenient probabilistic estimate of spectrogram phase in a complex convolutional DNN. In a typical cocktail party source separation scenario, we trained a convolutional DNN to re-s...
متن کاملProbabilistic Binary-Mask Cocktail-Party Source Separation in a Convolutional Deep Neural Network
Separation of competing speech is a key challenge in signal processing and a feat routinely performed by the human auditory brain. A long standing benchmark of the spectrogram approach to source separation is known as the ideal binary mask. Here, we train a convolutional deep neural network, on a twospeaker cocktail party problem, to make probabilistic predictions about binary masks. Our result...
متن کاملCocktail Party Processing via Structured Prediction
While human listeners excel at selectively attending to a conversation in a cocktail party, machine performance is still far inferior by comparison. We show that the cocktail party problem, or the speech separation problem, can be effectively approached via structured prediction. To account for temporal dynamics in speech, we employ conditional random fields (CRFs) to classify speech dominance ...
متن کاملDeep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural Network
Identification and extraction of singing voice from within musical mixtures is a key challenge in source separation and machine audition. Recently, deep neural networks (DNN) have been used to estimate 'ideal' binary masks for carefully controlled cocktail party speech separation problems. However, it is not yet known whether these methods are capable of generalizing to the discrimination of vo...
متن کاملDeep Transform: Time-Domain Audio Error Correction via Probabilistic Re-Synthesis
In the process of recording, storage and transmission of time-domain audio signals, errors may be introduced that are difficult to correct in an unsupervised way. Here, we train a convolutional deep neural network to resynthesize input time-domain speech signals at its output layer. We then use this abstract transformation, which we call a deep transform (DT), to perform probabilistic re-synthe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1503.06046 شماره
صفحات -
تاریخ انتشار 2015